nlp_architect.data.intent_datasets.TabularIntentDataset

class nlp_architect.data.intent_datasets.TabularIntentDataset(train_file, test_file, sentence_length=30, word_length=12)[source]

Tabular Intent/Slot tags dataset loader. Compatible with many sequence tagging datasets (ATIS, CoNLL, etc..) data format must be int tabular format where: - one word per line with tag annotation and intent type separated by tabs <token> <tag_label> <intent>

  • sentences are separated by an empty line

Parameters
  • train_file (str) – path to train set file

  • test_file (str) – path to test set file

  • sentence_length (int) – max sentence length

  • word_length (int) – max word length

__init__(train_file, test_file, sentence_length=30, word_length=12)[source]

Initialize self. See help(type(self)) for accurate signature.

Methods

__init__(train_file, test_file[, …])

Initialize self.

Attributes

char_vocab

word character vocabulary

char_vocab_size

char vocabulary size

files

intent_size

intent label vocabulary size

intents_vocab

intent labels vocabulary

label_vocab_size

label vocabulary size

tags_vocab

labels vocabulary

test_set

test set

train_set

train set

word_vocab

tokens vocabulary

word_vocab_size

vocabulary size

char_vocab

word character vocabulary

Type

dict

char_vocab_size

char vocabulary size

Type

int

files = ['train', 'test']
intent_size

intent label vocabulary size

Type

int

intents_vocab

intent labels vocabulary

Type

dict

label_vocab_size

label vocabulary size

Type

int

tags_vocab

labels vocabulary

Type

dict

test_set

test set

Type

tuple of numpy.ndarray

train_set

train set

Type

tuple of numpy.ndarray

word_vocab

tokens vocabulary

Type

dict

word_vocab_size

vocabulary size

Type

int